Cross-Genre Author Profile Prediction Using Stylometry-Based Approach
نویسندگان
چکیده
Author profiling task aims to identify different traits of an author by analyzing his/her written text. This study presents a Stylometry-based approach for detection of author traits (gender and age) for cross-genre author profiles. In our proposed approach, we used different types of stylistic features including 7 lexical features, 16 syntactic features, 26 character-based features and 6 vocabulary richness (total 56 stylistic features). On the training corpus, the proposed approach obtained promising results with an accuracy of 0.787 for gender, 0.983 for age and 0.780 for both (jointly detecting age and gender). On the test corpus, proposed system gave an accuracy of 0.576 for gender, 0.371 for age and 0.256 for both.
منابع مشابه
Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016
Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, w...
متن کاملA Machine Learning-based Intrinsic Method for Cross-topic and Cross-genre Authorship Verification
This paper presents our approach for the Author Identification task in the PAN CLEF Challenge 2015. We identified the challenges of this year’s are the limited amount of training data and the problems in the sub-corpora are independent in terms of topic and genre. We adopted a machine learning based intrinsic method to verify whether a pair of documents have been written by same or different au...
متن کاملOverview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre persp...
متن کاملProfile-based Approach for Age and Gender Identification
This paper describes the participation between the LIDIC research group of the UNSL from Argentina and the Language and Reasoning research group of the UAM Cuajimalpa from Mexico at the PAN’s 2016 Author Profiling task. For the proposed method we adopted a profile-based approach, which has been successfully applied in the Authorship Attribution problem. Thus, we proposed a variation of this tec...
متن کاملDeep Level Lexical Features for Cross-lingual Authorship Attribution
Crosslingual document classification aims to classify documents written in different languages that share a common genre, topic or author. Knowledge-based methods and others based on machine translation deliver state-of-the-art classification accuracy, however because of their reliance on external resources, poorly resourced languages present a challenge for these type of methods. In this paper...
متن کامل